{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "635d8ebb",
   "metadata": {},
   "source": [
    "# Using the OpenAI API (GPT-4o Multimodal)\n",
    "\n",
    "- Author: [Erika Park](https://www.linkedin.com/in/yeonseo-park-094193198/)\n",
    "- Peer Review: [JeongGi Park](https://www.linkedin.com/in/jeonggipark/), [Wooseok Jeong](https://github.com/jeong-wooseok)\n",
    "- Proofread : [Q0211](https://github.com/Q0211)\n",
    "- This is a part of [LangChain Open Tutorial](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial)\n",
    "\n",
    "[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/01-Basic/05-Using-OpenAIAPI-MultiModal.ipynb)[![Open in GitHub](https://img.shields.io/badge/Open%20in%20GitHub-181717?style=flat-square&logo=github&logoColor=white)](https://github.com/LangChain-OpenTutorial/LangChain-OpenTutorial/blob/main/01-Basic/05-Using-OpenAIAPI-MultiModal.ipynb)\n",
    "## Overview\n",
    "\n",
    "This tutorial explains how to effectively use OpenAI's ```GPT-4o``` multimodal model with ```LangChain```, a versatile framework for building language model applications. You'll learn to set up and work with the ```ChatOpenAI``` object for tasks such as generating responses, analyzing model outputs, and leveraging advanced features like real-time response streaming and token log probability analysis. By the end of this guide, you'll have the tools to experiment with and deploy sophisticated AI solutions smoothly and efficiently.\n",
    "\n",
    "\n",
    "### Table of Contents\n",
    "\n",
    "- [Overview](#overview)\n",
    "- [Environment Setup](#environment-setup)\n",
    "- [ChatOpenAI GPT-4o Multimodal](#chatopenai-gpt-4o-multimodal)\n",
    "- [Multimodal AI: Text and Image Processing with GPT-4o](#multimodal-ai-text-and-image-processing-with-gpt-4o)\n",
    "- [Configuring Multimodal AI with System and User Prompts](#configuring-multimodal-ai-with-system-and-user-prompts)\n",
    "\n",
    "\n",
    "### References\n",
    "\n",
    "- [OpenAI Model Overview](https://platform.openai.com/docs/models)\n",
    "\n",
    "----"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "28dfd3e1",
   "metadata": {},
   "source": [
    "## Environment Setup\n",
    "\n",
    "Set up the environment. You may refer to [Environment Setup](https://wikidocs.net/257836) for more details.\n",
    "\n",
    "**[Note]**\n",
    "- ```langchain-opentutorial``` is a package that provides a set of easy-to-use environment setup, useful functions, and utilities for tutorials. \n",
    "- You can checkout out the [```langchain-opentutorial```](https://github.com/LangChain-OpenTutorial/langchain-opentutorial-pypi) for more details."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "id": "cbfb7682",
   "metadata": {},
   "outputs": [],
   "source": [
    "%%capture --no-stderr\n",
    "%pip install langchain-opentutorial"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "id": "f7705b37",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Install required packages\n",
    "from langchain_opentutorial import package\n",
    "\n",
    "package.install(\n",
    "    [\"langchain\", \"langchain_openai\"],\n",
    "    verbose=False,\n",
    "    upgrade=False,\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "id": "4454c8ed",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Environment variables have been set successfully.\n"
     ]
    }
   ],
   "source": [
    "# Set environment variables\n",
    "from langchain_opentutorial import set_env\n",
    "\n",
    "set_env(\n",
    "    {\n",
    "        # \"OPENAI_API_KEY\": \"\",\n",
    "        # \"LANGCHAIN_API_KEY\": \"\",\n",
    "        \"LANGCHAIN_TRACING_V2\": \"true\",\n",
    "        \"LANGCHAIN_ENDPOINT\": \"https://api.smith.langchain.com\",\n",
    "        \"LANGCHAIN_PROJECT\": \"Using-OpenAI-API\",\n",
    "    }\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "170b0d8c",
   "metadata": {},
   "source": [
    "You can alternatively set API keys such as ```OPENAI_API_KEY``` in a ```.env``` file and load them.\n",
    "\n",
    "[Note] This is not necessary if you've already set the required API keys in previous steps."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "id": "17971dad",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "True"
      ]
     },
     "execution_count": 4,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Configuration file to manage the API KEY as an environment variable\n",
    "from dotenv import load_dotenv\n",
    "\n",
    "# Load API KEY information\n",
    "load_dotenv(override=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "6a95de64",
   "metadata": {},
   "source": [
    "## ChatOpenAI GPT-4o Multimodal\n",
    "\n",
    "This is a chat-specific Large Language Model (LLM) provided by OpenAI.\n",
    "\n",
    "When creating an object, the following options can be specified. Details about the options are as follows:\n",
    "\n",
    "```temperature```\n",
    "\n",
    "- Specifies the sampling temperature, which can be chosen between 0 and 2. A higher value, such as 0.8, results in more random outputs, while a lower value, such as 0.2, makes the outputs more focused and deterministic.\n",
    "\n",
    "```max_tokens```\n",
    "\n",
    "- The maximum number of tokens to generate for the chat completion.\n",
    "\n",
    "```model_name``` : List of available models\n",
    "- ```gpt-4o```\n",
    "- ```gpt-4o-mini```\n",
    "- ```o1-preview```, ```o1-preview-mini``` : Available only for Tier 5 accounts, which require a minimum recharge of $1,000 to access.\n",
    "\n",
    "![gpt-models.png](./assets/04-using-openai-api-gpt4o-get-models.png)\n",
    "\n",
    "- Link: https://platform.openai.com/docs/models\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "id": "c2f2222d",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Answer]: content='The capital of the United States is Washington, D.C.' additional_kwargs={'refusal': None} response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 14, 'total_tokens': 27, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0aa8d3e20b', 'finish_reason': 'stop', 'logprobs': None} id='run-513b84b7-4d52-4256-9af1-1713ba4f4930-0' usage_metadata={'input_tokens': 14, 'output_tokens': 13, 'total_tokens': 27, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}}\n"
     ]
    }
   ],
   "source": [
    "from langchain_openai.chat_models import ChatOpenAI\n",
    "\n",
    "# Create the ChatOpenAI object\n",
    "llm = ChatOpenAI(\n",
    "    temperature=0.1,  # Creativity (range: 0.0 ~ 2.0)\n",
    "    model_name=\"gpt-4o-mini\",  # Model name\n",
    ")\n",
    "\n",
    "question = \"What is the capital of USA?\"\n",
    "\n",
    "print(f\"[Answer]: {llm.invoke(question)}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b3cbd1b7",
   "metadata": {},
   "source": [
    "### Response Format (AI Message)\n",
    "When using the ```ChatOpenAI``` object, the response is returned in the format of an AI Message. This includes the text content generated by the model along with any metadata or additional properties associated with the response. These provide structured information about the AI's reply and how it was generated.\n",
    "\n",
    "**Key Components of AI Message**\n",
    "1. **```content```**  \n",
    "   - **Definition:** The primary response text generated by the AI.  \n",
    "   - **Example:** **\"The capital of South Korea is Seoul.\"**\n",
    "   - **Purpose:** This is the main part of the response that users interact with.\n",
    "\n",
    "2. **```response_metadata```**  \n",
    "   - **Definition:** Metadata about the response generation process.  \n",
    "   - **Key Fields:**\n",
    "     - **```model_name``` :** Name of the model used (e.g., ```\"gpt-4o-mini\"``` ).\n",
    "     - **```finish_reason``` :** Reason the generation stopped (**stop** for normal completion).\n",
    "     - **```token_usage``` :** Token usage details:\n",
    "       - **```prompt_tokens``` :** Tokens used for the input query.\n",
    "       - **```completion_tokens``` :** Tokens used for the response.\n",
    "       - **```total_tokens``` :** Combined token count.\n",
    "\n",
    "3. **```id```**  \n",
    "   - **Definition:** A unique identifier for the API call.  \n",
    "   - **Purpose:** Useful for tracking or debugging specific interactions."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "id": "05170153",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/plain": [
       "AIMessage(content='The capital of the United States is Washington, D.C.', additional_kwargs={'refusal': None}, response_metadata={'token_usage': {'completion_tokens': 13, 'prompt_tokens': 14, 'total_tokens': 27, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0aa8d3e20b', 'finish_reason': 'stop', 'logprobs': None}, id='run-16669141-7244-4cf4-91dd-8c3f1efc8d24-0', usage_metadata={'input_tokens': 14, 'output_tokens': 13, 'total_tokens': 27, 'input_token_details': {'audio': 0, 'cache_read': 0}, 'output_token_details': {'audio': 0, 'reasoning': 0}})"
      ]
     },
     "execution_count": 6,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Query content\n",
    "question = \"What is the capital of USA?\"\n",
    "\n",
    "# Query\n",
    "response = llm.invoke(question)\n",
    "response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "id": "92bddfef",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Response: The capital of the United States is Washington, D.C.\n",
      "Model: gpt-4o-mini-2024-07-18\n",
      "Total Tokens Used: 27\n"
     ]
    }
   ],
   "source": [
    "# Extract key components\n",
    "content = response.content  # AI's response text\n",
    "model_name = response.response_metadata[\"model_name\"]\n",
    "total_tokens = response.response_metadata[\"token_usage\"][\"total_tokens\"]\n",
    "\n",
    "# Print results\n",
    "print(f\"Response: {content}\")\n",
    "print(f\"Model: {model_name}\")\n",
    "print(f\"Total Tokens Used: {total_tokens}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f0eb83f5",
   "metadata": {},
   "source": [
    "### Activating ```LogProb```\n",
    "\n",
    "```LogProb``` represents the **logarithmic probabilities** assigned by the model to predicted tokens. A token is an individual unit of text, such as a word, character, or part of a word. The probability indicates the **model's confidence in predicting each token**.\n",
    "\n",
    "**Use Cases**:\n",
    "```LogProb``` is useful for evaluating the model's prediction confidence, debugging issues, and optimizing prompts. By analyzing ```LogProb``` data, you can understand why the model selected specific tokens.\n",
    "\n",
    "**Caution**:\n",
    "Enabling ```LogProb``` increases the response data size, which may affect API speed and cost. It is recommended to activate it only when necessary."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "id": "7ed53fa2",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Object creation with LogProb enabled\n",
    "llm_with_logprob = ChatOpenAI(\n",
    "    temperature=0.1, max_tokens=2048, model_name=\"gpt-4o-mini\"\n",
    ").bind(\n",
    "    logprobs=True\n",
    ")  # Activating LogProb to retrieve token-level probabilities"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "id": "b1fe5071",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "{'token_usage': {'completion_tokens': 9, 'prompt_tokens': 14, 'total_tokens': 23, 'completion_tokens_details': {'accepted_prediction_tokens': 0, 'audio_tokens': 0, 'reasoning_tokens': 0, 'rejected_prediction_tokens': 0}, 'prompt_tokens_details': {'audio_tokens': 0, 'cached_tokens': 0}}, 'model_name': 'gpt-4o-mini-2024-07-18', 'system_fingerprint': 'fp_0aa8d3e20b', 'finish_reason': 'stop', 'logprobs': {'content': [{'token': 'The', 'bytes': [84, 104, 101], 'logprob': 0.0, 'top_logprobs': []}, {'token': ' capital', 'bytes': [32, 99, 97, 112, 105, 116, 97, 108], 'logprob': 0.0, 'top_logprobs': []}, {'token': ' of', 'bytes': [32, 111, 102], 'logprob': 0.0, 'top_logprobs': []}, {'token': ' India', 'bytes': [32, 73, 110, 100, 105, 97], 'logprob': 0.0, 'top_logprobs': []}, {'token': ' is', 'bytes': [32, 105, 115], 'logprob': 0.0, 'top_logprobs': []}, {'token': ' New', 'bytes': [32, 78, 101, 119], 'logprob': -3.7742768e-05, 'top_logprobs': []}, {'token': ' Delhi', 'bytes': [32, 68, 101, 108, 104, 105], 'logprob': -4.3202e-07, 'top_logprobs': []}, {'token': '.', 'bytes': [46], 'logprob': -5.5122365e-07, 'top_logprobs': []}], 'refusal': None}}\n"
     ]
    }
   ],
   "source": [
    "# Query content\n",
    "question = \"What is the capital of India?\"\n",
    "\n",
    "# Query\n",
    "response = llm_with_logprob.invoke(question)\n",
    "\n",
    "# Display the response metadata\n",
    "print(response.response_metadata)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "b123a6a9",
   "metadata": {},
   "source": [
    "### Streaming Output\n",
    "\n",
    "The streaming option is particularly useful for receiving real-time responses to queries.\n",
    "\n",
    "Instead of waiting for the entire response to be generated, the model streams the output token by token or in chunks, enabling faster interaction and immediate feedback."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "id": "dea59d60",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Sure! Here are 10 beautiful tourist destinations in the USA along with their addresses:\n",
      "\n",
      "1. **Grand Canyon National Park**\n",
      "   - Address: Grand Canyon Village, AZ 86023\n",
      "\n",
      "2. **Yosemite National Park**\n",
      "   - Address: 9035 Village Dr, Yosemite Valley, CA 95389\n",
      "\n",
      "3. **Yellowstone National Park**\n",
      "   - Address: 2 J. G. P. Rd, Yellowstone National Park, WY 82190\n",
      "\n",
      "4. **Niagara Falls**\n",
      "   - Address: Niagara Falls, NY 14303\n",
      "\n",
      "5. **Maui, Hawaii**\n",
      "   - Address: Maui, HI (specific locations vary, e.g., Lahaina, Kihei)\n",
      "\n",
      "6. **Sedona, Arizona**\n",
      "   - Address: Sedona, AZ 86336\n",
      "\n",
      "7. **Savannah, Georgia**\n",
      "   - Address: Savannah, GA 31401 (Historic District)\n",
      "\n",
      "8. **New Orleans, Louisiana**\n",
      "   - Address: New Orleans, LA 70112 (French Quarter)\n",
      "\n",
      "9. **Acadia National Park**\n",
      "   - Address: 20 McFarland Hill Dr, Bar Harbor, ME 04609\n",
      "\n",
      "10. **Washington, D.C. (National Mall)**\n",
      "    - Address: 900 Ohio Dr SW, Washington, DC 20024\n",
      "\n",
      "These destinations offer stunning natural beauty, rich history, and unique cultural experiences. Enjoy your travels!"
     ]
    }
   ],
   "source": [
    "answer = llm.stream(\n",
    "    \"Please provide 10 beautiful tourist destinations in USA along with their addresses!\"\n",
    ")\n",
    "\n",
    "# Streaming real-time output\n",
    "for token in answer:\n",
    "    print(token.content, end=\"\", flush=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "a8295bc8",
   "metadata": {},
   "source": [
    "## Multimodal AI: Text and Image Processing with GPT-4o\n",
    "\n",
    "Multimodal refers to technologies or approaches that integrate and process multiple types of information (modalities). This includes a variety of data types such as:\n",
    "\n",
    "- Text: Information in written form, such as documents, books, or web pages.\n",
    "- Image: Visual information, including photos, graphics, or illustrations.\n",
    "- Audio: Auditory information, such as speech, music, or sound effects.\n",
    "- Video: A combination of visual and auditory information, including video clips or real-time streaming.\n",
    "\n",
    "```gpt-4o``` and ```gpt-4-turbo``` are equipped with vision capabilities, enabling them to process and recognize images alongside textual inputs. "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "d783c005",
   "metadata": {},
   "source": [
    "### Step 1. Setting up ChatOpenAI\n",
    "\n",
    "First, create a ```ChatOpenAI``` object with the ```gpt-4o``` model and streaming capabilities enabled."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "id": "2c0393e7",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the ChatOpenAI object\n",
    "llm = ChatOpenAI(\n",
    "    temperature=0.1,\n",
    "    model_name=\"gpt-4o\",\n",
    "    streaming=True,  # Enable streaming for real-time output\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5596e60a",
   "metadata": {},
   "source": [
    "### Step 2. Encoding Images\n",
    "Images need to be encoded into **Base64** format for the model to process them. The following function handles both URL-based and local image files:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "id": "506405fe",
   "metadata": {},
   "outputs": [],
   "source": [
    "import requests\n",
    "import base64\n",
    "import mimetypes\n",
    "from IPython.display import display, HTML, Image\n",
    "\n",
    "\n",
    "def encode_image(image_path_or_url):\n",
    "    if image_path_or_url.startswith(\"http://\") or image_path_or_url.startswith(\n",
    "        \"https://\"\n",
    "    ):\n",
    "        # Download image from URL\n",
    "        response = requests.get(image_path_or_url)\n",
    "        if response.status_code == 200:\n",
    "            image_content = response.content\n",
    "        else:\n",
    "            raise Exception(f\"Failed to download image: {response.status_code}\")\n",
    "        # Guess MIME type based on URL\n",
    "        mime_type, _ = mimetypes.guess_type(image_path_or_url)\n",
    "        if mime_type is None:\n",
    "            mime_type = (\n",
    "                \"application/octet-stream\"  # Default MIME type for unknown files\n",
    "            )\n",
    "    else:\n",
    "        # Read image from local file\n",
    "        try:\n",
    "            with open(image_path_or_url, \"rb\") as image_file:\n",
    "                image_content = image_file.read()\n",
    "            # Guess MIME type based on file extension\n",
    "            mime_type, _ = mimetypes.guess_type(image_path_or_url)\n",
    "            if mime_type is None:\n",
    "                mime_type = (\n",
    "                    \"application/octet-stream\"  # Default MIME type for unknown files\n",
    "                )\n",
    "        except FileNotFoundError:\n",
    "            raise Exception(f\"File not found: {image_path_or_url}\")\n",
    "\n",
    "    # Base64 encode the image\n",
    "    return f\"data:{mime_type};base64,{base64.b64encode(image_content).decode()}\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "1a41a3db",
   "metadata": {},
   "source": [
    "**Example: Encode and Display an Image** \n",
    "\n",
    "* URL-based Image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "id": "5a4f1fa1",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "IMAGE_URL = \"https://t3.ftcdn.net/jpg/03/77/33/96/360_F_377339633_Rtv9I77sSmSNcev8bEcnVxTHrXB4nRJ5.jpg\"\n",
    "encoded_image_url = encode_image(IMAGE_URL)\n",
    "display(Image(url=encoded_image_url))  # Display the image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "57b49c6d",
   "metadata": {},
   "source": [
    "* Local Image:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "id": "53c770ff",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\" alt=\"Image\" style=\"max-width: 100%; height: auto;\">"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "IMAGE_PATH = \"./assets/04-using-openai-api-gpt4o-sample-image.png\"\n",
    "encoded_image_file = encode_image(IMAGE_PATH)\n",
    "html_code = f'<img src=\"{encoded_image_file}\" alt=\"Image\" style=\"max-width: 100%; height: auto;\">'\n",
    "display(HTML(html_code))  # Display the image"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "9b8b0b48",
   "metadata": {},
   "source": [
    "### Step 3: Creating Messages\n",
    "Define a function to generate the messages required for the model. This includes:\n",
    "\n",
    "- **System Prompt**: Defines the role and task for the AI.\n",
    "- **User Prompt**: Provides the specific task instructions.\n",
    "- **Encoded Image**: Includes the Base64 image data."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "id": "d8fb5764",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Function to create messages for the AI\n",
    "def create_messages(encoded_image):\n",
    "    system_prompt = \"You are a helpful assistant on parsing images.\"\n",
    "    user_prompt = \"Explain the given images in-depth.\"\n",
    "    return [\n",
    "        {\"role\": \"system\", \"content\": system_prompt},\n",
    "        {\n",
    "            \"role\": \"user\",\n",
    "            \"content\": [\n",
    "                {\"type\": \"text\", \"text\": user_prompt},\n",
    "                {\"type\": \"image_url\", \"image_url\": {\"url\": encoded_image}},\n",
    "            ],\n",
    "        },\n",
    "    ]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "50ba577c",
   "metadata": {},
   "source": [
    "### Step 4: Model Interaction\n",
    "Now, send the generated messages to the model and stream the results in real time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "id": "1922dadb",
   "metadata": {},
   "outputs": [],
   "source": [
    "def stream_response(llm, messages):\n",
    "    response = llm.stream(messages)  # Stream AI response\n",
    "    print(\"Streaming response:\")\n",
    "    for chunk in response:\n",
    "        print(\n",
    "            chunk.content, end=\"\", flush=True\n",
    "        )  # Print each response chunk in real time"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 17,
   "id": "777b1dae",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"https://t3.ftcdn.net/jpg/03/77/33/96/360_F_377339633_Rtv9I77sSmSNcev8bEcnVxTHrXB4nRJ5.jpg\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Streaming response:\n",
      "The image is a table with a header labeled \"TABLE 001: LOREM IPSUM DOLOR AMIS ENIMA ACCUMER TUNA.\" It contains five columns with the following headings:\n",
      "\n",
      "1. **Loremis**\n",
      "2. **Amis terim**\n",
      "3. **Gāto lepis**\n",
      "4. **Tortores**\n",
      "\n",
      "Each row under these headings contains various data points:\n",
      "\n",
      "- **Lorem dolor siamet**: 8,288, 123%, YES, $89\n",
      "- **Consecter odio**: 123, 87%, NO, $129\n",
      "- **Gatoque accums**: 1,005, 12%, NO, $199\n",
      "- **Sed hac enim rem**: 56, 69%, N/A, $199\n",
      "- **Rempus tortor just**: 5,554, 18%, NO, $999\n",
      "- **Klimas nsecter**: 455, 56%, NO, $245\n",
      "- **Babiask atoque accu**: 1,222, 2%, YES, $977\n",
      "- **Enim rem kos**: 5,002, 91%, N/A, $522\n",
      "\n",
      "The table uses placeholder text (\"Lorem ipsum\") for both the title and the row labels, which is commonly used in design to fill space until actual content is available. The data appears to be numerical and categorical, with percentages, binary options (YES/NO), and monetary values. The last row of text is a continuation of the placeholder text, providing no additional information."
     ]
    }
   ],
   "source": [
    "# Display the image\n",
    "display(Image(url=IMAGE_URL))\n",
    "encoded_image_url = encode_image(IMAGE_URL)\n",
    "\n",
    "#  Create messages and stream responses\n",
    "messages_url = create_messages(encoded_image_url)\n",
    "stream_response(llm, messages_url)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "id": "2af66a9a",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\" alt=\"Image\" style=\"max-width: 100%; height: auto;\">"
      ],
      "text/plain": [
       "<IPython.core.display.HTML object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Streaming response:\n",
      "The image is an informational poster about the \"First OpenAI DevDay Event\" held on November 6, 2023. It highlights several key updates and features introduced during the event. Here's a detailed breakdown:\n",
      "\n",
      "### Event Details:\n",
      "- **Title:** First OpenAI DevDay Event\n",
      "- **Date:** November 6, 2023\n",
      "- **Key Announcements:**\n",
      "  - GPT 4 Turbo\n",
      "  - 128k Tokens\n",
      "  - Custom GPTs\n",
      "  - Assistant API\n",
      "  - Price Reduction\n",
      "\n",
      "### Source:\n",
      "- A YouTube link is provided for more information: [OpenAI DevDay Event](https://www.youtube.com/watch?v=U9mjJuUkhUzk)\n",
      "\n",
      "### Main Updates Summarized:\n",
      "1. **Token Length:** \n",
      "   - Increased to 128K tokens.\n",
      "   \n",
      "2. **Custom GPTs:**\n",
      "   - Available as private or public options.\n",
      "   \n",
      "3. **Multi Modal:**\n",
      "   - Supports image, video, and voice inputs.\n",
      "   \n",
      "4. **JSON Mode:**\n",
      "   - Guaranteed functionality.\n",
      "   \n",
      "5. **Assistant API:**\n",
      "   - Available for developers.\n",
      "   \n",
      "6. **Text to Speech:**\n",
      "   - In beta release.\n",
      "   \n",
      "7. **Natural Voice Options:**\n",
      "   - Offers 6 different voices.\n",
      "   \n",
      "8. **GPT Store:**\n",
      "   - Revenue sharing model.\n",
      "   \n",
      "9. **Conversation Threading:**\n",
      "   - Organized per conversation.\n",
      "   \n",
      "10. **File Uploading:**\n",
      "    - Supports multiple files.\n",
      "    \n",
      "11. **API Price Reduction:**\n",
      "    - Reduced by 2.5x to 3.5x.\n",
      "    \n",
      "12. **Code Interpreter:**\n",
      "    - Built-in feature.\n",
      "    \n",
      "13. **Function Calling:**\n",
      "    - Built-in feature.\n",
      "\n",
      "### Branding:\n",
      "- The poster includes the logo and branding of \"Astra Techz,\" with the tagline \"Simplifying Technology.\"\n",
      "\n",
      "### Footer:\n",
      "- A call to action to visit [www.astratechz.com](http://www.astratechz.com) for building AI solutions.\n",
      "\n",
      "This poster serves as a concise summary of the new features and improvements announced at the OpenAI DevDay event, aimed at developers and businesses interested in AI advancements."
     ]
    }
   ],
   "source": [
    "# Encoding image\n",
    "IMAGE_PATH = \"./assets/04-using-openai-api-gpt4o-sample-image.png\"\n",
    "encoded_image_file = encode_image(IMAGE_PATH)\n",
    "html_code = f'<img src=\"{encoded_image_file}\" alt=\"Image\" style=\"max-width: 100%; height: auto;\">'\n",
    "display(HTML(html_code))\n",
    "\n",
    "# Create messages and stream responses\n",
    "messages_file = create_messages(encoded_image_file)\n",
    "stream_response(llm, messages_file)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "49e63b32",
   "metadata": {},
   "source": [
    "## Configuring Multimodal AI with System and User Prompts\n",
    "This tutorial demonstrates how to configure a multimodal AI using **system prompts** and **user prompts**, and how to process and interpret an image-based financial table.\n",
    "\n",
    "\n",
    "### What Are Prompts? \n",
    "\n",
    "**System Prompt**\n",
    "Defines the AI's identity, responsibilities, and behavior for the session:\n",
    "\n",
    "* Sets the AI's context, ensuring consistent responses.\n",
    "* Example: \"You are a financial assistant specializing in interpreting tables.\"\n",
    "\n",
    "**User Prompt**\n",
    "Gives task-specific instructions to guide the AI:\n",
    "\n",
    "* Specifies what the user expects the AI to do.\n",
    "* Example: \"Analyze this financial table and summarize the insights.\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "f5936926",
   "metadata": {},
   "source": [
    "### Step 1: Set Up the ChatOpenAI Object\n",
    "The ```ChatOpenAI``` object initializes the model with the desired configurations, such as temperature and model type."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "id": "1e7394c4",
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create the ChatOpenAI object\n",
    "llm = ChatOpenAI(temperature=0.1, model_name=\"gpt-4o\", streaming=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "aff6f630",
   "metadata": {},
   "source": [
    "### Step2: Encode and Display the Image\n",
    "Images need to be encoded into Base64 format so the AI can process them. "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "id": "97b6dd22",
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<img src=\"\"/>"
      ],
      "text/plain": [
       "<IPython.core.display.Image object>"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "IMAGE_URL = \"https://media.wallstreetprep.com/uploads/2022/05/24100154/NVIDIA-Income-Statement.jpg?_gl=1*zqx63z*_gcl_au*MTI3Njg2MTE3Mi4xNzM1NDg1OTky*_ga*Mjg1MjY3NTAzLjE3MzU0ODU5OTI.*_ga_0X18K5P59L*MTczNTQ4NTk5MS4xLjAuMTczNTQ4NTk5MS42MC4wLjE1OTkyODA0MTI.\"\n",
    "\n",
    "encoded_image_url = encode_image(IMAGE_URL)\n",
    "display(Image(url=encoded_image_url))  # Display the original image."
   ]
  },
  {
   "cell_type": "markdown",
   "id": "0772154c",
   "metadata": {},
   "source": [
    "### Step 3: Define System and User Prompts\n",
    "Set up the prompts to guide the AI’s behavior and task execution."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "id": "1f4d54c9",
   "metadata": {},
   "outputs": [],
   "source": [
    "# System prompt: Describe the AI's role and responsibilities\n",
    "system_prompt = \"\"\"You are a financial AI assistant specializing in interpreting tables (financial statements).\n",
    "Your mission is to analyze the provided table-format financial statements and summarize interesting insights in a friendly and clear manner.\"\"\"\n",
    "\n",
    "# User prompt: Provide instructions for the task\n",
    "user_prompt = \"\"\"The table provided to you represents a company's financial statements. Summarize the interesting insights from the table.\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "id": "7f06b709",
   "metadata": {},
   "source": [
    "### Step 4: Create Messages for the AI\n",
    "Combine the system prompt, user prompt, and the encoded image into a structured message format."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "id": "0180888a",
   "metadata": {},
   "outputs": [],
   "source": [
    "messages = [\n",
    "    {\"role\": \"system\", \"content\": system_prompt},\n",
    "    {\n",
    "        \"role\": \"user\",\n",
    "        \"content\": [\n",
    "            {\"type\": \"text\", \"text\": user_prompt},\n",
    "            {\"type\": \"image_url\", \"image_url\": {\"url\": encoded_image_url}},\n",
    "        ],\n",
    "    },\n",
    "]"
   ]
  },
  {
   "cell_type": "markdown",
   "id": "459a24dc",
   "metadata": {},
   "source": [
    "### Step 5: Stream the AI's Response\n",
    "Use the AI model to process the messages and stream the results in real time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "id": "9d762d82",
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Streaming response:\n",
      "Here's a summary of the financial insights from the table:\n",
      "\n",
      "1. **Revenue Growth**: The company experienced significant revenue growth over the three years. Revenue increased from $10,918 million in 2020 to $26,914 million in 2022, showing strong business expansion.\n",
      "\n",
      "2. **Gross Profit Increase**: Gross profit also rose substantially, from $6,768 million in 2020 to $17,475 million in 2022, indicating improved profitability and efficient cost management.\n",
      "\n",
      "3. **Operating Expenses**: Operating expenses increased over the years, with research and development costs rising from $2,829 million in 2020 to $5,268 million in 2022. This suggests a focus on innovation and product development.\n",
      "\n",
      "4. **Net Income Growth**: Net income saw a remarkable increase, more than tripling from $2,796 million in 2020 to $9,752 million in 2022. This reflects overall improved financial performance.\n",
      "\n",
      "5. **Earnings Per Share (EPS)**: Both basic and diluted EPS showed significant growth. Basic EPS increased from $1.15 in 2020 to $3.91 in 2022, while diluted EPS rose from $1.13 to $3.85, indicating higher returns for shareholders.\n",
      "\n",
      "6. **Income Before Tax**: Income before income tax increased from $2,970 million in 2020 to $9,941 million in 2022, showing strong operational performance.\n",
      "\n",
      "Overall, the company demonstrated robust growth in revenue, profitability, and shareholder returns over the three-year period."
     ]
    }
   ],
   "source": [
    "def stream_response(llm, messages):\n",
    "    response = llm.stream(messages)  # Stream AI response\n",
    "    print(\"Streaming response:\")\n",
    "    for chunk in response:\n",
    "        print(\n",
    "            chunk.content, end=\"\", flush=True\n",
    "        )  # Print each response chunk in real time\n",
    "\n",
    "\n",
    "# Execute streaming\n",
    "stream_response(llm, messages)"
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "langchain-opentutorial-bMU5IxA3-py3.11",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.11"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
